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We investigate the nondeterministic state complexity of basic operations for suffix-free regular lan- 
guages. The nondeterministic state complexity of an operation is the number of states that are nec- 
essary and sufficient in the worst-case for a minimal nondeterministic finite-state automaton that 
accepts the language obtained from the operation. We consider basic operations (catenation, union, 
intersection, Kleene star, reversal and complementation) and establish matching upper and lower 
bounds for each operation. In the case of complementation the upper and lower bounds differ by an 
additive constant of two. 
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1 Introduction 

Codes are useful in information processing, data compression, cryptography and information transmis- 
sion ifTSl . Some well-known examples are prefix codes, suffix codes, bifix codes and infix codes. People 
use different codes for different application domains based on the characteristic of each code |1, 18 1. 
Since a code is a language, the conditions that classify codes define subfamilies of language families. 
For regular languages, for example, the prefix-freeness of prefix codes defines the family of prefix-free 
regular languages, which is a proper subfamily of regular languages. Prefix-freeness is fundamental in 
coding theory; for example, Huffman codes are prefix-free sets. The advantage of prefix-free codes is 
that we can decode a given encoded string deterministically. The symmetric to prefix codes are suffix 
codes; given a prefix code, its reversal is always a suffix code. However, suffix codes have their own 
unique characteristics and are not always completely symmetric to prefix codes. For instance, a finite- 
state automaton (FA) is prefix-free if and only if it has no out-transitions from any final state. If we think 
of a reversal of this FA, we can think of an FA whose start state has no in- transitions. However, this 
condition is just a necessary condition for being suffix-free but not sufficient. Thus, we often need to 
examine the suffix-free case separately. 

Regular languages are given by FAs or regular expressions. There are two main types of FAs: deter- 
ministic finite-state automata (DFAs) and nondeterministic finite-state automata (NFAs). NFAs provide 
exponential savings in space compared with DFAs but the problem to convert a given DFA to an equiv- 
alent minimal NFA is PSPACE-complete |[T4l . For finite languages, Salomaa and Yu ll22l showed that 
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0(/:'°i=2*+i ) is a tight bound for converting an w-state NFA to a DFA, where k is the size of an input 
alphabet. 

There are at least two different models for the state complexity of operations: The deterministic state 
complexity model considers minimal DFAs and the nondeterministic state complexity considers minimal 
NFAs. 

Yu et al. ll25l l26l investigated the deterministic state complexity for various operations on regular 
languages. As special cases of state complexity, Campeanu et al. [3| and Han and Salomaa [7 | exam- 
ined the deterministic state complexity of finite languages. Pighizzini and Shallit [20J investigated the 
deterministic state complexity of unary language operations. Moreover, Han et al. lITOl studied the de- 
terministic state complexity of prefix-free regular languages and Han and Salomaa [8] looked into the 
deterministic state complexity of suffix-free regular languages. After writing this paper, we have found 
out that Jiraskova and Olejar [17] have also considered the nondeterministic state complexity of union 
and intersection for suffix-free languages. They have established a tight bound for union and intersection 
using binary languages. There are several other results with respect to the state complexity of various 
operations [4, 5, 21 1. 

Holzer and Kutrib |[T2l studied the nondeterministic state complexity of regular languages. Jirasek et 
al. |[T5l examined the nondeterministic state complexity of complementation of regular languages. Re- 
cently, Han et al. ||9l investigated the nondeterministic state complexity of prefix-free regular languages. 
As a continuation of our research for the operational nondeterministic state complexity of subfamilies 
of regular languages, we consider the nondeterministic state complexity of suffix-free regular languages. 
Since suffix codes are one of the fundamental classes of codes, it is important to calculate the precise 
bounds. Moreover, determining the state complexity of operations on fundamental subfamilies of the 
regular languages can provide valuable insights on connections between restrictions placed on language 
definitions and descriptional complexity. 

In Section|2l we define some basic notions. In Section|3l we examine the worst-case nondeterministic 
state complexity of basic operations (union, catenation, intersection, Kleene star, reversal and comple- 
mentation) of suffix-free regular languages. Except for the complementation operation, we prove that 
the results are tight by giving general lower bound examples that match the upper bounds. 

We give a comparison table between the deterministic state complexity and the nondeterministic state 
complexity in Section |4l 

2 Preliminaries 

Let £ denote a finite alphabet of characters and £* denote the set of all strings over £. The size |r| 
of Z is the number of characters in £. A language over £ is any subset of Z*. The symbol denotes 
the empty language and the symbol A denotes the null string. For strings x,y and z, we say that x is a 
suffix ofy liy = zx. We define a (regular) language L to be suffix-free if a string x G L is not a suffix of 
any other strings in L. Given a string x in a set X of strings, let x'^ be the reversal of x, in which case 
Z« = {x« |xGX}. 

An FA A is specified by a tuple 5,s,F), where 2 is a finite set of states, Z is an input alphabet, 
5 : 2 X £ — )• 2^ is a transition function, s G Q is the start state and F C 2 is a set of final states. If 
F consists of a single state /, then we use / instead of {/} for simplicity. Let |2| be the number of 
states in Q. We define the size |A| of A to be the number of states in A; namely |A| = \Q\. For a 
transition q G 5{p,a) in A, we say that p has an out-transition and q has an in-transition. Furthermore, p 
is a source state of q and ^ is a target state of p. We say that A is non-returning if the start state of A does 



Y.-S. Han and K. Salomaa 



191 



not have any in-transitions and A is non-exiting if all final states of A do not have any out- transitions. If 
5{q,a) has a single element q', then we denote 5{q,a) = q' instead of 5{q,a) = {q'} for simplicity. 

A string x over Z is accepted by A if there is a labeled path from 5 to a final state such that this path 
spells out X. We call this path an accepting path. Then, the language L{A) of A is the set of all strings 
spelled out by accepting paths in A. We say that a state of A is useful if it appears in an accepting path in 
A; otherwise, it is useless. Unless otherwise mentioned, in the following we assume that all states of an 
FA are useful. 

We say that an FA A is a suffix-free FA if L{A) is suffix-free. Notice that a suffix-free FA must 
be non-returning by definition. We assume that a given NFA has no A -transitions since we can always 
transform an «-state NFA with A -transitions to an equivalent n-state NFA without A -transitions flBl . 

For complete background knowledge in automata theory, the reader may refer to textbooks |[T3l |23l 

M- 

Before tackling the problem, we present a nice technique that gives a lower bound for the size of 
NFAs and establish a lemma that is crucial to prove the tight bound for the nondeterministic state com- 
plexity in the following sections. Notice that an FA for a non-trivial suffix-free regular language L 
(namely, L 7^ {A}) must have at least 2 states since such FA needs at least one start state and one final 
state. 

Proposition 1 ((The fooling set technique (21161)) Let L Q Z* be a regular language. Suppose that 
there exists a set of pairs 

P={{Xi,Wi) I 1 </<«} 

such that 

1. For all i with \ <i <n, we have XiWi € L; 

2. For all i,j with 1 <:i,j<n and i ^ j, at least one ofxiWj ^ L andxjWi ^ L holds. 
Then, a minimal NFA for L has at least n states. 

The set P satisfying the conditions of Proposition [U is called a fooling set for L. The fooling set 
technique was first proposed by Birget ||2l. A related technique was considered by Glaister and Shallit |[6l . 

Lemma 2 Let n> 2 be an arbitrary integer. A minimal NFA of the suffix-free language L\ = L{b{a"^^)*) 
with n>2 or of the suffix-free language L2 = L{b{a"^^)*b) with n>3 has n states. 

We use NSC(L) to denote the number of states of a minimal NFA for L; namely, N§C(L) is the 
nondeterministic state complexity of L. 

3 State Complexity 

We first examine the nondeterministic state complexity of binary operations (union, catenation and inter- 
section) for suffix-free regular languages. Then, we study the unary operation cases (Kleene star, reversal 
and complementation). We rely on a unique structural property of a suffix-free FA for obtaining upper 
bounds: The start state does not have any in-transitions (the non-returning property). 
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3.1 Union 

Han and Salomaa lH showed that mn — {m + n) + 2 is the state complexity of the union of an m-state 
suffix-free DFA and an n-state suffix-free DFA using the Cartesian product of states. For the NFA state 
complexity, we directly construct an NFA for the union of two suffix-free regular languages without the 
Cartesian product. The construction relies on nondeterminism and the fact that the computation of a 
suffix-free FA cannot return to the start state. 

Theorem 3 Given two suffix-free regular languages Li and L2, the nondeterministic state complex- 
ity N§C(Li UL2)forLi UL2 is m + n- I, where m = N§C(Li), n = NSC(L2), m,n > 2 and \L\ > 2. 

3.2 Catenation 

For the catenation operation, (2m — 1)2"~^ is the state complexity for the DFA case ||26]| and m + n is 
the state complexity for the NFA case fT2]. Thus, there is an exponential gap between two cases. For 
the prefix-free regular languages, the state complexity is linear in the sizes of the component automata 
in both DFA and NFA cases because of a unique structural property of a prefix-free automaton |j9l. The 
deterministic state complexity of the catenation of suffix-free regular languages is {m — 1)2"^^ + 1 |[8l. 

Theorem 4 Given two suffix-free regular languages L\ and L2, the nondeterministic state complex- 
ity N§C(LiL2)/or L1L2 is m + n -\, where m = NSC(Li) and n = NSC(L2). 

3.3 Intersection 

Given two FAs A = (Qi,L,5i,s\,F\) and B = {Q2,'^,^,^2,F2), we can construct an FA M = {Qi x 
Q2,^,5,{si,S2),Fi X F2) for the intersection of L{A) and L{B) based on the Cartesian product of states, 
where 

5{{p,q),a) = {5i{p,a),52iq,a)) for p ^Q\,q & Q2 and a G £. 

From the Cartesian product, we know that the upper bound for the intersection of two FAs is at most 
mn, where m and n are the numbers of states for A and B. We now examine M and reduce the upper 
bound based on the suffix-freeness of input FAs. Let A and B be suffix-free. This implies that both A and 
B are non-returning and, thus, and ^2 do not have any in- transitions. 

Proposition 5 All states {s\,q) and {p,S2), for p{^ s\) G Q\ and q{^ S2) G Q2, are unreachable from 
{s\ , S2) in M since L{A) and L(B) are suffix-free. 

Based on Proposition |5l we remove all unreachable states and reduce the upper bound as follows: 

mn — (m — 1) — — 1) = mn — (m + n) + 2. 

Namely, mn — {m + n) + 2 states are sufficient for L{A) n L{B) when both A and B are non-returning. 

Theorem 6 Given two suffix-free regular languages L\ and L2, the nondeterministic state complex- 
ity NSC(Li r\L2) for Lir\L2 is mn-{m + n)+2, where m = NSC(Li), n = NSC(L2) and |S| > 3. 

Theorem |6] considers when NSC(Li), NSC(L2) > 2. If either of them is 1, then NSC(Li DLi) = 
1 since the single state suffix-free regular language is {A}. The deterministic state complexity of the 
intersection of two suffix-free DFAs is mn — 2{m + n) +6 HI. The complexity gap between the DFA 
case and the NFA case is because of the sink state. An NFA does not need to have a sink state. 



Y.-S. Han and K. Salomaa 



193 



3.4 Kleene Star 

We examine the Kleene star operation of suffix-free NFAs. Han and Salomaa l8| investigated the de- 
terministic state complexity for Kleene star and demonstrated that 2"'^^ + 1 states are necessary and 
sufficient in the worst-case for an m-state suffix-free DFA. 

Theorem 7 Given a sujfix-free regular language L, the nondeterministic state complexity NSC{L*) for 
L* is m, where m = NSC(L). 

3.5 Reversal 

Given an m-state NFA A, NSC(L(A)) is in general m + \ fTT]. If L{A) is prefix-free, then we know that 
NSC(L(A)) ism [91. 

The upper bound m + 1 is based on the simple NFA construction for from A for L: We flip the 
transition directions and make the start state to be a final state and all final states to be start states of 
A. Now we have an NFA with multiple start states. We introduce a new start state and make a A- 
transition from the new start state to the original start states. Then, we apply the A -transition removal 
technique |[T3l . which does not change the number of states. Thus, we have an m + 1-state NFA for L^. 

Now we consider a lower bound for reversal. It seems difficult to apply the fooling set method for this 
operation. For the below lemma we use an ad hoc proof that has been modified from the corresponding 
argument used in Holzer and Kutrib lil2J for the reversal of general regular languages. 

Lemma 8 Let Z = {a^b,c,d} and m > 4. There exists a sujfix-free regular language over E with 
NSC(L) < m such that NSC(L^) = m + 1. 

In the construction used for Lemma[8l when m>4 the symbol d can be replaced by b or c. We have 
stated the construction using a four-letter alphabet for the sake of easier readability. We do not know 
whether the lower bound m + 1 can be reached by the reversal of suffix-free regular languages over a 
two-letter alphabet. 

Using the general upper bound from Holzer and Kutrib llT2l . Lemma[8]gives the following statement: 

Theorem 9 If Lis a suffix-free regular language recognized by an NFA with m states, then N§C(L^) < 
m + 1. The bound m + 1 can be reached by suffix-free languages over a three letter alphabet when m > ^ 

3.6 Complementation of suffix-free regular languages 

The complementation of NFA is an expensive operation with respect to state complexity. Meyer and 
Fischer [19] already noticed that the transforming an m-state NFA to a DFA requires 2"' states. The 
complementation of an m-state DFA does not require additional states since it simply interchanges final 
states and non-final states. Thus, based on the subset construction, we know that 2™ states are sufficient 
for the complementation of an m-state NFA. Jiraskova llT6l showed that 2'" states aie necessary for the 
tight bound when [£| = 2. 

Lemma 10 Given an m-state suffix-free NFA A = {Q,'L,5,s,F), 2™^^ + 1 states are sufficient for its 
complementation language L{A). 

' An anonymous referee of the paper has suggested a different lower bound construction over a 3-letter alphabet that works 
also in the case m = 3. 
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Lemma 11 Let E = {a,b,c} and L\ C {a^b}* be a regular language. Let L QH* be a regular language 
such that 



Then NSC(L) > NSC(Li) - 1. 

If in the statement of Lemma [TT] the language L is suffix-free, the proof implies that NSC(L) > 
NSC(Li). In this case, the constructed NFA B does need the start state of the original NFA A since A is 
non-returning. However, below Lemma [TT] will be used for a complementation of suffix-free languages 
(that need not be suffix-free) and the bound cannot be improved in this way. 

Lemma 12 Let £ = {a,b,c} and m>2. There exists a suffix-free regular language L C E* such that 



The results of Lemma [TOl and Lemma [T2l give the following. 

Theorem 13 Given a suffix-free regular language L having an NFA with m states, NSC(L) < 2'"^' + 1. 
There exists a suffix-free regular language L over a three letter alphabet such that NSC(L) = m and 
N§C(L) >2'"-i-l. 

Theorem [13] gives the precise worst-case nondeterministic state complexity of complementation 
within a constant of two. The worst-case example for complementation in Jiraskova |T6l uses a binary 
alphabet, however, our construction needs an additional symbol to make the languages suffix-free. We 
do not know what is the nondeterministic state complexity of complementation for suffix-free languages 
over a binary alphabet. 

4 Conclusions 

We have investigated the nondeterministic state complexity of basic operations for suffix-free regular 
languages. We have relied on a unique structural property of a suffix-free FA: The start state does not 
have any in-transitions. Based on this property, we have examined the nondeterministic state complexity 
with respect to catenation, union, intersection, Kleene star, reversal and complementation. Table[T]shows 
the comparison between the deterministic state complexity and the nondeterministic the state complexity. 



Ln(c-r) =c-Li. 



(1) 



N§C(L) < m and N§C(L) > T"'^ - 1. 



operation suffix-free DFAs 



suffix-free NFAs 



Li-L2 (m-l)2"-^ + l 

L\\JL2 mn — {m -\- n) -\- 2 

LinL2 m« — 2(m + «) + 6 

Ll 2'"-2 + 1 



m-\-n— 1 
m-\-n— 1 
mn — 2{m + n) + 2 



m 



Lf 2'"-2 + 1 



m + 1 

2m-l. 




±1 



Table 1 : State complexity of basic operations between suffix-free DFAs and NFAs. 
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